Discussion:
FOP - HTML2PDF
echoo
2011-01-25 08:32:15 UTC
Permalink
Dear


I have a problem which I don't know how to solve:

I have an xml file which I want to transform to:
- a pdf file.
- a xsl file.

For this, I use Apache FOP (as I am working in a Java environment).
The result of this is nice except for one thing:

My xml has one field which is called 'introduction' and which accepts
HTML contents.
After transformation, the plain html is shown in the pdf file.

I want the HTML to be interpreted.


Yours Sincerely



Christof
--
View this message in context: http://old.nabble.com/FOP---HTML2PDF-tp30748316p30748316.html
Sent from the FOP - Users mailing list archive at Nabble.com.
mehdi houshmand
2011-01-25 08:47:20 UTC
Permalink
Hi Christof,

Just to be clear, you've got an XML element that contains HTML and you
want that HTML to be interpreted as text (i.e. you want the HTML tags
removed?)? If so, this isn't strictly a FOP question, FOP isn't
responsible for analysing/parsing/interpreting XML directly (though
admittedly it does accept XML as input with an XSLT to transform the
XML to FO). Anyway, the point is, you want that knowledge to be in the
XSL. The XSL/XSLT is responsible for parsing the XML and converting it
into FO, how you do that is a question for an XSLT forum, but one way
would be using regexs to return the string you want. One google search
yielded http://www.xml.com/pub/a/2003/06/04/tr.html which seems like a
fairly nice little introduction to regexes in XSLT.

I hope that helps

Mehdi
Dear
   - a pdf file.
   - a xsl file.
  For this, I use Apache FOP (as I am working in a Java environment).
   My xml has one field which is called 'introduction' and which accepts
HTML contents.
   After transformation, the plain html is shown in the pdf file.
   I want the HTML to be interpreted.
Yours Sincerely
Christof
--
View this message in context: http://old.nabble.com/FOP---HTML2PDF-tp30748316p30748316.html
Sent from the FOP - Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
echoo
2011-01-25 13:33:03 UTC
Permalink
Dear Mehdi


Thank you for your reply.

What do you mean with 'interpreted as text'?
What I want is that if I have a <table> tag in my the html content, a table
should be drawn in the resulting pdf file. I just don't know how :-) (yet)
You link might be, indeed, useful.


Yours Sincerely




Christof
Post by mehdi houshmand
Hi Christof,
Just to be clear, you've got an XML element that contains HTML and you
want that HTML to be interpreted as text (i.e. you want the HTML tags
removed?)? If so, this isn't strictly a FOP question, FOP isn't
responsible for analysing/parsing/interpreting XML directly (though
admittedly it does accept XML as input with an XSLT to transform the
XML to FO). Anyway, the point is, you want that knowledge to be in the
XSL. The XSL/XSLT is responsible for parsing the XML and converting it
into FO, how you do that is a question for an XSLT forum, but one way
would be using regexs to return the string you want. One google search
yielded http://www.xml.com/pub/a/2003/06/04/tr.html which seems like a
fairly nice little introduction to regexes in XSLT.
I hope that helps
Mehdi
Dear
   - a pdf file.
   - a xsl file.
  For this, I use Apache FOP (as I am working in a Java environment).
   My xml has one field which is called 'introduction' and which accepts
HTML contents.
   After transformation, the plain html is shown in the pdf file.
   I want the HTML to be interpreted.
Yours Sincerely
Christof
--
http://old.nabble.com/FOP---HTML2PDF-tp30748316p30748316.html
Sent from the FOP - Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
---------------------------------------------------------------------
--
View this message in context: http://old.nabble.com/FOP---HTML2PDF-tp30748316p30757986.html
Sent from the FOP - Users mailing list archive at Nabble.com.
mehdi houshmand
2011-01-25 13:47:36 UTC
Permalink
Hi Christof,

Correct me if I'm wrong, but you're trying to extract the relevant
text from the HTML and convert that to FO objects in XML. If so, that
looks like a job for regex i.e. finding strings - in your case, you'd
be looking for <table>ANY STRING</table> (I presume) and insert that
text into FO elements. However, there's almost definitely a more
intuitive way to do that using XSLT, but that's not really the scope
of this forum. You want all that intelligence in the XSLT, you want
the XSLT to parse the HTML and create the necessary FO elements. XSLT
is a very powerful tool, and most likely someone else would have done
what you're trying to do or at least something similar that you
could... Uhm... *cough* plagiarize *cough*. My point is there's no
point reinventing the wheel, Google is your friend, check this out,
might be a good starting point:
http://stackoverflow.com/questions/1639625/can-i-parse-an-html-using-xslt.

I hope that helps

Mehdi
Post by echoo
Dear Mehdi
Thank you for your reply.
What do you mean with 'interpreted as text'?
What I want is that if I have a <table> tag in my the html content, a table
should be drawn in the resulting pdf file. I just don't know how :-) (yet)
You link might be, indeed, useful.
Yours Sincerely
Christof
Post by mehdi houshmand
Hi Christof,
Just to be clear, you've got an XML element that contains HTML and you
want that HTML to be interpreted as text (i.e. you want the HTML tags
removed?)? If so, this isn't strictly a FOP question, FOP isn't
responsible for analysing/parsing/interpreting XML directly (though
admittedly it does accept XML as input with an XSLT to transform the
XML to FO). Anyway, the point is, you want that knowledge to be in the
XSL. The XSL/XSLT is responsible for parsing the XML and converting it
into FO, how you do that is a question for an XSLT forum, but one way
would be using regexs to return the string you want. One google search
yielded http://www.xml.com/pub/a/2003/06/04/tr.html which seems like a
fairly nice little introduction to regexes in XSLT.
I hope that helps
Mehdi
Dear
   - a pdf file.
   - a xsl file.
  For this, I use Apache FOP (as I am working in a Java environment).
   My xml has one field which is called 'introduction' and which accepts
HTML contents.
   After transformation, the plain html is shown in the pdf file.
   I want the HTML to be interpreted.
Yours Sincerely
Christof
--
http://old.nabble.com/FOP---HTML2PDF-tp30748316p30748316.html
Sent from the FOP - Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
---------------------------------------------------------------------
--
View this message in context: http://old.nabble.com/FOP---HTML2PDF-tp30748316p30757986.html
Sent from the FOP - Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
Wim VN
2011-01-25 10:36:34 UTC
Permalink
Hello Christof,

I'm not sure but I think the solution can be found in XSLT and not in FOP.

If I understand correctly: you use an XSL transformation to go from a source
xml file to an intermediate XSL-FO file. Afterwards you process this with
Apache FOP to a final PDF document.
Within the xml there is a tag that holds html content. You wish this content
to be interpreted.

Is it not possible to have the XSL transformation lookup that <introduction>
tag and make sure it is converting the html content to FO as well?

I am not an XSLT expert and if I'm not mistaken other forums might be a
better choice to get help on this specific problem.

Good luck with your project
Wim
Post by echoo
Dear
- a pdf file.
- a xsl file.
For this, I use Apache FOP (as I am working in a Java environment).
My xml has one field which is called 'introduction' and which accepts
HTML contents.
After transformation, the plain html is shown in the pdf file.
I want the HTML to be interpreted.
Yours Sincerely
Christof
--
View this message in context: http://old.nabble.com/FOP---HTML2PDF-tp30748316p30748881.html
Sent from the FOP - Users mailing list archive at Nabble.com.
echoo
2011-01-25 13:27:07 UTC
Permalink
Hello Wim VN


Thank you for your reply.

Yes, what you metion is exactly what I want to do.

I am not sure but I believe I can use the xsl file (xhtml2fo.xsl), provided
by antennahouse(http://www.antennahouse.com/XSLsample/XSLsample.htm), to
lookup the html translations. This is what you mean with 'have the XSL
transformation lookup that <introduction> tag'?


Yours Sincerely




Christof
Post by Wim VN
Hello Christof,
I'm not sure but I think the solution can be found in XSLT and not in FOP.
If I understand correctly: you use an XSL transformation to go from a
source xml file to an intermediate XSL-FO file. Afterwards you process
this with Apache FOP to a final PDF document.
Within the xml there is a tag that holds html content. You wish this
content to be interpreted.
Is it not possible to have the XSL transformation lookup that
<introduction> tag and make sure it is converting the html content to FO
as well?
I am not an XSLT expert and if I'm not mistaken other forums might be a
better choice to get help on this specific problem.
Good luck with your project
Wim
Post by echoo
Dear
- a pdf file.
- a xsl file.
For this, I use Apache FOP (as I am working in a Java environment).
My xml has one field which is called 'introduction' and which accepts
HTML contents.
After transformation, the plain html is shown in the pdf file.
I want the HTML to be interpreted.
Yours Sincerely
Christof
--
View this message in context: http://old.nabble.com/FOP---HTML2PDF-tp30748316p30757603.html
Sent from the FOP - Users mailing list archive at Nabble.com.
Continue reading on narkive:
Loading...