Script: Downloading CRS data #33

Closed
opened 2020-06-09 11:51:36 +02:00 by nickdickinson · 7 comments
No description provided.
Author
Owner
Similar project: https://github.com/datasets/dac-and-crs-code-lists
decentral1se commented 2020-06-09 15:46:01 +02:00 (Migrated from github.com)
Author
Owner

OK, quick sniff around and I can see that things are not very easy to smoke out but I expected that 😄 So, the OECD API seems to be what we need: https://data.oecd.org/api/sdmx-json-documentation, giving, for example, the CRS dataset straight off the net: https://stats.oecd.org/sdmx-json/data/CRS1/all/all?startTime=2018 (where CRS1 is the data set code I pulled from https://stats.oecd.org/Index.aspx?DataSetCode=CRS1). That download is really slow though, so we might consider some secondary packaging system which has less latency. Looking around for libraries, there seems to be this pandasdmx package which supports interfacing with OECD API directly: https://pandasdmx.readthedocs.io/en/v1.0/sources.html#oecd-organisation-for-economic-cooperation-and-development. I'll get stuck into this in the coming days.

OK, quick sniff around and I can see that things are not very easy to smoke out but I expected that :smile: So, the OECD API seems to be what we need: https://data.oecd.org/api/sdmx-json-documentation, giving, for example, the CRS dataset straight off the net: https://stats.oecd.org/sdmx-json/data/CRS1/all/all?startTime=2018 (where `CRS1` is the data set code I pulled from https://stats.oecd.org/Index.aspx?DataSetCode=CRS1). That download is really slow though, so we might consider some secondary packaging system which has less latency. Looking around for libraries, there seems to be this `pandasdmx` package which supports interfacing with OECD API directly: https://pandasdmx.readthedocs.io/en/v1.0/sources.html#oecd-organisation-for-economic-cooperation-and-development. I'll get stuck into this in the coming days.
Author
Owner

Lets consider just using the csv downloads. Spend a bit of time looking but
I think SDMX is perhaps adding a layer more of complexity than useful. We
probably want to go there but not now necessarily for this data flow.

See the python scraper I shared which gets the main txt files from their
file server and unzips them and they are in the dropbox folder too. I used
them to combine CRS and GLAAS data on a recent project.

On Tue, Jun 9, 2020, 15:46 decentral1se notifications@github.com wrote:

OK, quick sniff around and I can see that things are not very easy to
smoke out but I expected that 😄 So, the OECD API seems to be what we
need: https://data.oecd.org/api/sdmx-json-documentation, giving, for
example, the CRS dataset straight off the net:
https://stats.oecd.org/sdmx-json/data/CRS1/all/all?startTime=2018 (where
CRS1 is the data set code I pulled from
https://stats.oecd.org/Index.aspx?DataSetCode=CRS1). That download is
really slow though, so we might consider some secondary packaging system
which has less latency. Looking around for libraries, there seems to be
this pandasdmx package which supports interfacing with OECD API directly:
https://pandasdmx.readthedocs.io/en/v1.0/sources.html#oecd-organisation-for-economic-cooperation-and-development.
I'll get stuck into this in the coming days.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/WASHNote/WASHWeb/issues/33#issuecomment-641307624,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAOP4AHRINX465BX6OW5773RVY4KRANCNFSM4NZHTYPA
.

Lets consider just using the csv downloads. Spend a bit of time looking but I think SDMX is perhaps adding a layer more of complexity than useful. We probably want to go there but not now necessarily for this data flow. See the python scraper I shared which gets the main txt files from their file server and unzips them and they are in the dropbox folder too. I used them to combine CRS and GLAAS data on a recent project. On Tue, Jun 9, 2020, 15:46 decentral1se <notifications@github.com> wrote: > OK, quick sniff around and I can see that things are not very easy to > smoke out but I expected that 😄 So, the OECD API seems to be what we > need: https://data.oecd.org/api/sdmx-json-documentation, giving, for > example, the CRS dataset straight off the net: > https://stats.oecd.org/sdmx-json/data/CRS1/all/all?startTime=2018 (where > CRS1 is the data set code I pulled from > https://stats.oecd.org/Index.aspx?DataSetCode=CRS1). That download is > really slow though, so we might consider some secondary packaging system > which has less latency. Looking around for libraries, there seems to be > this pandasdmx package which supports interfacing with OECD API directly: > https://pandasdmx.readthedocs.io/en/v1.0/sources.html#oecd-organisation-for-economic-cooperation-and-development. > I'll get stuck into this in the coming days. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <https://github.com/WASHNote/WASHWeb/issues/33#issuecomment-641307624>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAOP4AHRINX465BX6OW5773RVY4KRANCNFSM4NZHTYPA> > . >
decentral1se commented 2020-06-10 16:57:24 +02:00 (Migrated from github.com)
Author
Owner

OK, makes a lot of sense, will do!

OK, makes a lot of sense, will do!
decentral1se commented 2020-06-15 21:49:35 +02:00 (Migrated from github.com)
Author
Owner

@nickdickinson hey, I've taken a dive into this again and actually, there isn't much we need to do. The https://github.com/datasets/dac-and-crs-code-lists people actually publish their data to https://datahub.io/core/dac-and-crs-code-lists, which can be manipulated directly using the https://github.com/frictionlessdata/datapackage-py library which comes from the open standard building people of https://frictionlessdata.io!

For example, we can download and manipulate CRS data with only the following lines https://datahub.io/core/dac-and-crs-code-lists#python. That means we don't need https://github.com/WASHNote/WASHWeb/issues/36 because the data is already packaged in this open format which is supported by tooling like https://pandas.pydata.org (you can directly load the dataset into a Pandas frame!)

The tools to download and put it into structured data are already built :) Furthermore, we could easily follow this model for other datasets that we want to follow up with.

So then I'd propose that we think about https://github.com/WASHNote/WASHWeb/issues/35 (perhaps closing off this and https://github.com/WASHNote/WASHWeb/issues/36) and skip directly to what we want to learn from this dataset. I haven't looked at the open data ecosystem for a while but it seems to have matured.

@nickdickinson hey, I've taken a dive into this again and actually, there isn't much we need to do. The https://github.com/datasets/dac-and-crs-code-lists people actually publish their data to https://datahub.io/core/dac-and-crs-code-lists, which can be manipulated directly using the https://github.com/frictionlessdata/datapackage-py library which comes from the open standard building people of https://frictionlessdata.io! For example, we can download and manipulate CRS data with only the following lines https://datahub.io/core/dac-and-crs-code-lists#python. That means we don't need https://github.com/WASHNote/WASHWeb/issues/36 because the data is already packaged in this open format which is supported by tooling like https://pandas.pydata.org (you can directly load the dataset into a Pandas frame!) The tools to download and put it into structured data are already built :) Furthermore, we could easily follow this model for other datasets that we want to follow up with. So then I'd propose that we think about https://github.com/WASHNote/WASHWeb/issues/35 (perhaps closing off this and https://github.com/WASHNote/WASHWeb/issues/36) and skip directly to what we want to learn from this dataset. I haven't looked at the open data ecosystem for a while but it seems to have matured.
Author
Owner

This is great. Let's explore. The idea of the export views is to be able to
generate a new "view" in the sql sense of this dataset with only the WASH
sector code and sorting the donors and creating WASH web org IDs. That can
then be used to start to populate wikidata. For example, add donors and
their disbursements USD for certain year to wikidata. Clearly we also need
to map orgs between wikidata and CRS.

On Mon, Jun 15, 2020 at 9:49 PM decentral1se notifications@github.com
wrote:

@nickdickinson https://github.com/nickdickinson hey, I've taken a dive
into this again and actually, there isn't much we need to do. The
https://github.com/datasets/dac-and-crs-code-lists people actually
publish their data to https://datahub.io/core/dac-and-crs-code-lists,
which can be manipulated directly using the
https://github.com/frictionlessdata/datapackage-py library which comes
from the open standard building people of https://frictionlessdata.io!

For example, we can download and manipulate CRS data with only the
following lines https://datahub.io/core/dac-and-crs-code-lists#python.
That means we don't need #36
https://github.com/WASHNote/WASHWeb/issues/36 because the data is
already packaged in this open format which is supported by tooling like
https://pandas.pydata.org (you can directly load the dataset into a
Pandas frame!)

The tools to download and put it into structured data are already built :)
Furthermore, we could easily follow this model for other datasets that we
want to follow up with.

So then I'd propose that we think about #35
https://github.com/WASHNote/WASHWeb/issues/35 (perhaps closing off this
and #36 https://github.com/WASHNote/WASHWeb/issues/36) and skip
directly to what we want to learn from this dataset. I haven't looked at
the open data ecosystem for a while but it seems to have matured.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/WASHNote/WASHWeb/issues/33#issuecomment-644342508,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAOP4ABF2VP6Z2OXCOKYD5LRWZ3N5ANCNFSM4NZHTYPA
.

--
Nicolas Dickinson

WASHNote
Stadhuisplein 15
3012 AR Rotterdam
The Netherlands

T: +31649875488
W: https://washnote.com
FB: washnote https://www.facebook.com/washnote

This is great. Let's explore. The idea of the export views is to be able to generate a new "view" in the sql sense of this dataset with only the WASH sector code and sorting the donors and creating WASH web org IDs. That can then be used to start to populate wikidata. For example, add donors and their disbursements USD for certain year to wikidata. Clearly we also need to map orgs between wikidata and CRS. On Mon, Jun 15, 2020 at 9:49 PM decentral1se <notifications@github.com> wrote: > @nickdickinson <https://github.com/nickdickinson> hey, I've taken a dive > into this again and actually, there isn't much we need to do. The > https://github.com/datasets/dac-and-crs-code-lists people actually > publish their data to https://datahub.io/core/dac-and-crs-code-lists, > which can be manipulated directly using the > https://github.com/frictionlessdata/datapackage-py library which comes > from the open standard building people of https://frictionlessdata.io! > > For example, we can download and manipulate CRS data with only the > following lines https://datahub.io/core/dac-and-crs-code-lists#python. > That means we don't need #36 > <https://github.com/WASHNote/WASHWeb/issues/36> because the data is > already packaged in this open format which is supported by tooling like > https://pandas.pydata.org (you can directly load the dataset into a > Pandas frame!) > > The tools to download and put it into structured data are already built :) > Furthermore, we could easily follow this model for other datasets that we > want to follow up with. > > So then I'd propose that we think about #35 > <https://github.com/WASHNote/WASHWeb/issues/35> (perhaps closing off this > and #36 <https://github.com/WASHNote/WASHWeb/issues/36>) and skip > directly to what we want to learn from this dataset. I haven't looked at > the open data ecosystem for a while but it seems to have matured. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/WASHNote/WASHWeb/issues/33#issuecomment-644342508>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAOP4ABF2VP6Z2OXCOKYD5LRWZ3N5ANCNFSM4NZHTYPA> > . > -- Nicolas Dickinson WASHNote Stadhuisplein 15 3012 AR Rotterdam The Netherlands T: +31649875488 W: https://washnote.com FB: washnote <https://www.facebook.com/washnote>
decentral1se commented 2020-06-16 14:08:16 +02:00 (Migrated from github.com)
Author
Owner

Oh, took a quick pass in the end today :)

And look mah, no infrastructure required!

https://github.com/WASHNote/notebooks#notebooks

Just hit the following URL and you can already see a "Hello, World" example:

https://mybinder.org/v2/gh/WASHNote/notebooks/master

We can proof out exactly what we need to start from these notebooks.

Speak tomorrow 🌈

Oh, took a quick pass in the end today :) And look mah, no infrastructure required! > https://github.com/WASHNote/notebooks#notebooks Just hit the following URL and you can already see a "Hello, World" example: > https://mybinder.org/v2/gh/WASHNote/notebooks/master We can proof out exactly what we need to start from these notebooks. Speak tomorrow :rainbow:
nickdickinson added this to the WASHWeb-2019 project 2023-11-14 10:48:27 +01:00
Sign in to join this conversation.
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: WASHWeb/WASHWeb-2019#33
No description provided.